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Abstract. We study combinatorial group testing schemes for learning d- 
sparse boolean vectors using highly unreliable disjunctive measurements. 
We consider an adversarial noise model that only limits the number 
of false observations, and show that any noise-resilient scheme in this 
model can only approximately reconstruct the sparse vector. On the 
positive side, we give a general framework for construction of highly 
noise-resilient group testing schemes using randomness condensers. Sim- 
ple randomized instantiations of this construction give non-adaptive mea- 
surement schemes, with m = 0(d log n) measurements, that allow effi- 
cient reconstruction of d-sparse vectors up to 0{d) false positives even 
in the presence of 5m false positives and Q{m/d) false negatives within 
the measurement outcomes, for any constant 5 < 1. None of these pa- 
rameters can be substantially improved without dramatically affecting 
the others. Furthermore, we obtain several explicit (and incomparable) 
constructions, in particular one matching the randomized trade-off but 
using m = ©(d^^"'^' logn) measurements. We also obtain explicit con- 
structions that allow fast reconstruction in time poly(m), which would 
be sublinear in n for sufficiently sparse vectors. 



1 Introduction 

Group testing is an area in applied combinatorics that deals with the follow- 
ing problem: Suppose that in a large population of individuals, it is suspected 
that a small number possess a condition or property that can only be certified 
by carrying out a particular test. Moreover suppose that a pooling strategy is 
permissible, namely, that it is possible to perform a test on a chosen group of 
individuals in parallel, in which case the outcome of the test would be positive 
if at least one of the individuals in the group possesses the condition. The triv- 
ial strategy would be to test each individual separately, which takes as many 
tests as the population size. The basic question in group testing is: how can 
we do better? This question is believed to be first posed by Dorfman [1] dur- 
ing the screening process of draftees in World War II. In this scenario, blood 
samples are drawn from a large number of people which are tested for a partic- 
ular disease. If a number of samples are pooled in a group, on which the test 
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is applied, the outcome would be positive if at least one of the samples in the 
group carries a particular antigen showing the disease. Since then, group testing 
has been applied for a wide range of purposes, from testing for defective items 
(e.g., defective light bulbs or resistors) as a part of industrial quality assurance 
[2] to DNA sequencing [3j and DNA library screening in molecular biology (see, 
e.g., [4|5|6|7|8] and the references therein), and less obvious applications such 
as multiaccess communication [9] , data compression jlOi , pattern matching , 
streaming algorithms [12] , software testing [13j , and compressed sensing [l^ , to 
name a few. Moreover, over decades, a vast amount of tools and techniques has 
been developed for various settings of the problem that we cannot thoroughly 
survey here, due to space restrictions. Instead, we refer the reader to the books 
by Du and Hwang [15|16j for a detailed account of the major developments in 
this area. 

More formally, the basic goal in group testing is to reconstruct a d-spars^ 
boolean vectoJl x e Fj, for a known integer parameter d > 0, from a set of 
observations. Each observation is the outcome of a measurement that outputs 
the bitwise OR of a prescribed subset of the coordinates in x. Hence, a mea- 
surement can be seen as a binary vector in which is the characteristic vector 
of the subset of the coordinates being combined together. More generally, a set 
of m measurements can be seen as an m x n binary matrix (that we call the 
measurement matrix) whose rows define the individual measurements. 

In this work we study group testing in presence of highly unreliable mea- 
surements that can produce false outcomes. We will mainly focus on situations 
where up to a constant fraction of the measurement outcomes can be incorrect. 
Moreover, we will mainly restrict our attention to non-adaptive measurements; 
the case in which the measurement matrix is fully determined before the observa- 
tion outcomes are known. Nonadaptive measurements are particularly important 
for applications as they allow the tests to be performed independently and in 
parallel, which saves significant time and cost. 

On the negative side, we show that when the measurements are allowed to be 
highly noisy, the original vector x cannot be uniquely reconstructed. Thus in this 
case it would be inevitable to resort to approximate reconstructions, i.e., pro- 
ducing a sparse vector x that is close to the original vector in Hamming distance. 
In particular, our result shows that if a constant fraction of the measurements 
can go wrong, the reconstruction might be different from the original vector in 
n{d) positions, irrespective of the number of measurements. For most applica- 
tions this might be an unsatisfactory situation, as even a close estimate of the 
set of positives might not reveal whether any particular individual is defective or 
not, and in certain scenarios (such as an epidemic disease or industrial quality 
assurance) it is unacceptable to miss any affected individuals. This motivates us 
to focus on approximate reconstructions with one-sided error. Namely, we will 

^ We define a d-sparse vector as a vector whose number of nonzero coefficients is at 
most d. 

^ We use the notation Wq for a field of size q. Occasionally we adapt this notation to 
denote a set of size q as well even if we do not need the underlying field structure. 



3 



require that the support of x contains the support of x and be possibly larger by 
up to 0{d) positions. It can be argued that, for most applications, such a scheme 
is as good as exact reconstruction, as it allows one to significantly narrow-down 
the set of defectives to up to 0{d) candidate positives. In particular, as observed 
in |17| . one can use a second stage if necessary and individually test the result- 
ing set of candidates to identify the exact set of positives, hence resulting in a 
so-called trivial two-stage group testing algorithm. Next, we will show that in 
any scheme that produces no or little false negative in the reconstruction, only 
up to 0{l/d) fraction of false negatives (i.e., observation of a instead of 1) 
in the measurements can be tolerated, while there is no such restriction on the 
amount of tolerable false positives. Thus, one-sided approximate reconstruction 
breaks down the symmetry between false positives and false negatives in our 
error model. 

On the positive side, we give a general construction for noise-resilient mea- 
surement matrices that guarantees approximate reconstructions up to 0(d) false 
positives. Our main result is a general reduction from the noise-resilient group 
testing problem to construction of well-studied combinatorial objects known as 
randomness condensers that play an important role in theoretical computer 
science. Different qualities of the underlying condenser correspond to different 
qualities of the resulting group testing scheme, as we describe later. Using the 
state of the art in derandomization theory, we obtain different instantiations 
of our framework with incomparable properties summarized in Table [1] In par- 
ticular, the resulting randomized constructions (obtained from optimal lossless 
condensers and extractors) can be set to tolerate (with overwhelming probabil- 
ity) any constant fraction (< 1) of false positives, an n{l/d) fraction of false 
negatives, and produce an accurate reconstruction up to 0(d) false positives 
(where the positive constant behind O(-) can be made arbitrarily small), which 
is the best trade-off one can hope for, all using only 0(dlogn) measurements. 
This almost matches the information-theoretic lower bound fi(dlog(n/d)) shown 
by simple counting. We will also show explicit (deterministic) constructions that 
can approach the optimal trade-off, and finally, those that are equipped with 
fully efficient reconstruction algorithms with running time polynomial in the 
number of measurements. 

Related Work. There is a large body of work in the group testing literature 
that is related to the present work; in this short presentation, we are only able 
to discuss a few with the highest relevance. The exact group testing problem 
in the noiseless scenario is handled by what is known as superimposed coding 
(see |18ll9j) or the closely related concepts of cover- free families or disjunct 
matrices^. It is known that, even for the noiseless case, exact reconstruction 
of d-sparse signals (when d is not too large) requires at least /?(d^ logn/logd) 

^ A d-superimposed code is a collection of binary vectors with the property that from 
the bitwise OR of up to d words in the family one can uniquely identify the comprising 
vectors. A d-cover-free family is a collection of subsets of a universe, none of which 
is contained in any union of up to d of the other subsets. These notions are extended 
to the noisy setting, e.g., in [20] . 
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measurements (several proofs of this fact are known, e.g., |21|22|23] ). An im- 
portant class of superimposed codes is constructed from combinatorial designs, 
among which we mention the construction based on MDS codes given by Kautz 
and Singleto n i24) . which, in the group testing notation, achieves 0{tP \og^ n) 
measurementaj. 



Approximate reconstruction of sparse vectors up to a small number of false 
positives (that is one focus of this work) has been studied as a major ingredient 
of trivial two-stage schemes |17I7I25I26I27I8| . In particular, a generalization of 
superimposed codes, known as selectors, was introduced in [26] which, roughly 
speaking, allows for identification of the sparse vector up to a prescribed number 
of false positives. They gave a non-constructive result showing that there are 
such (non-adaptive) schemes that keep the number of false positives at 0{d) 
using 0{dlog(n/d)) measurements, matching the optimal "counting bound". A 
probabilistic construction of asymptotically optimal selectors (resp., a related 
notion of resolvable matrices) is given in |8j (resp., [27]), and 128129] give slightly 
sub-optimal "explicit" constructions based on certain expander graphs obtained 
from dispersertlj. 

To give a concise comparison of the present work with those listed above, we 
mention some of the qualities of the group testing schemes that we will aim to 
attain: (1) low number of measurements, (2) arbitrarily good degree of approx- 
imation, (3) maximum possible noise tolerance, (4) efficient, deterministic con- 
struction: As typically the sparsity d is very small compared to n, a measurement 
matrix must be ideally fully explicitly constructible in the sense that each entry 
of the matrix should be computable in deterministic time poly (rf, log n) (e.g., 
while the constructions in [26,27,,8^28|29j are all polynomial-time computable in 
n, they are not fully explicit in this sense). (5) fully efficient reconstruction al- 
gorithm: For a similar reason, the length of the observation vector is typically 
far smaller than n; thus, it is desirable to have a reconstruction algorithm that 
identifies the support of the sparse vector in time polynomial in the number of 
measurements (which might be exponentially smaller than n). While the works 
that we mentioned focus on few of the criteria listed above (e.g., none of the 
above-mentioned schemes for approximate group testing are equipped with a 
fully efficient reconstruction algorithm) , our approach can potentially attain all 

* Interestingly, this classical construction can be regarded as a special instantiation 
of our framework where a "bounded degree univariate polynomial" is used in place 
of the underlying randomness condenser. However, the analysis and the properties 
of the resulting group testing schemes substantially differ for the two cases, and in 
particular, the MDS-based construction owes its properties essentially to the large 
distance of the underlying code. In Appendix |B] we will elaborate in more detail 
on this correspondence as well as a connection with the bit-probe model in data 
structures. 

^ The notion of selectors is useful in a noiseless setting. However, as remarked in 0, 
it can be naturally extended to include a "noise" parameter, and the probabilistic 
constructions of selectors can be naturally extended to this case. Nonetheless, this 
generalization does not distinguish between false positives and negatives and the 
explicit constructions of selectors ^28.29j cannot be used in a (highly) noisy setting. 
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at the same time. As we will see later, using the best known constructions of 
condensers we will have to settle to sub-optimal results in one or more of the 
aspects above. Nevertheless, the fact that any improvement in the construction 
of condensers would readily translate to improved group testing schemes (and 
also the rapid growth of derandomization theory) justifies the significance of the 
construction given in this work. 

2 Preliminaries 

For non- negative integers cq and ei, we say that an ordered pair of binary vectors 
{x, y), each in F^, are (cq, ei)-close (or x is (eo, ei)-close to y) ify can be obtained 
from X by flipping at most Cq bits from to 1 and at most ei bits from 1 to 
0. Hence, such x and y will be (eo + ei)-close in Hamming-distance. Further, 
{x, y) are called (eo, ei)-far if they are not (cq, ei)-close. Note that if x and y are 
seen as characteristic vectors of subsets X and Y of [n] , respectiveljH, they are 
(|y |X\y|)-close. Furthermore, (x, y) are (eo, ei)-close iff (y, x) are (ei, eo)- 
close. A group of m non-adaptive measurements for binary vectors of length n 
can be seen as an m x n matrix (that we call the measurement matrix) whose 
entry is 1 iff the jth coordinate of the vector is present in the disjunction 
defining the zth measurement. For a measurement matrix we denote by A[x\ 
the outcome of the measurements defined by A on a binary vector cc, that is, the 
bitwise OR of those columns of A chosen by the support of x. As motivated by 
our negative results, for the specific setting of the group testing problem that 
we are considering in this work, it is necessary to give an asymmetric treatment 
that distinguishes between inaccuracies due to false positives and false negatives. 
Thus, we will work with a notion of error-tolerating measurement matrices that 
directly and conveniently captures this requirement, as given below: 

Definition 1. Let m, n, d, eo, ei, efj, e'l be integers. An my. n measurement ma- 
trix A is called (eo, ei, Cq, e']^)-correcting for d-sparse vectors if, for every y G F2' 
there exists z S (called a valid decoding of y) such that for every a; S Fj, 
whenever (a;, z) are (ej), e']^)-far, (A[x],y) are (eo,ei)-far. The matrix A is called 
fully explicit if each entry of the matrix can be computed in time poly(logn). 

Intuitively, the definition states that two measurements are allowed to be con- 
fused only if they are produced from close vectors. In particular, an (eo, ei, Cq, e'l)- 
correcting matrix gives a group testing scheme that reconstructs the sparse vec- 
tor up to ej) false positives and e'^ false negatives even in the presence of eo false 
positives and ei false negatives in the measurement outcome. Under this nota- 
tion, unique decoding would be possible using an (eo, ei, 0, 0)-correcting matrix 
if the amount of measurement errors is bounded by at most eo false positives 
and ei false negatives. However, when c'q -\-e'i is positive, decoding may require a 
bounded amount of ambiguity, namely, up to eo false positives and e'l false nega- 
tives in the decoded sequence. In the combinatorics literature, the special case of 



We use the shorthand [n] for the set {1, 2, . . . , n}. 
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(0, 0, 0, 0)-correcting matrices is known as d- superimposed codes or d-separahle 
matrices and is closely related to the notions of d- cover- free families and d- 
disjunct matrices (cf. jH] for precise definitions). Also, (0, 0, Cq, 0)-correcting 
matrices are related to the notion of selectors in |,26j and resolvable matrices in 

The min-entropy of a distribution X with finite support S is given by Hoc{X) 
:= miuaig^l— logPr;t'(2;)}, where Ptx{x) is the probability that X assigns to x. 
The statistical distance of two distributions X and y defined on the same finite 
space S is given by 5 X^sgS I ^^•^('*) ~ which is half the £1 distance 

of the two distributions when regarded as vectors of probabilities over S. Two 
distributions X and y are said to be e-close if their statistical distance is at 
most e. We will use the shorthand Un for the uniform distribution on F2 , and 
X X for a random variable X drawn from a distribution X. A function 
C: F2 X F2 — > F2 is a strong k — >e k' condenser if for every distribution X on 
F2 with min-entropy at least fc, random variable X ^ X and a seed Y ^ lAt, 
the distribution of (F, C{X, Y)) is e-close to some distribution {Ut^ Z) with min- 
entropy at least t -f k' . The parameters e, k — k' , and £ — k' are called the error, 
the entropy loss and the overhead of the condenser, respectively. A condenser 
with zero entropy loss is called lossless, and a condenser with zero overhead is 
called a strong (k, e)- extractor. A condenser is explicit if it is polynomial-time 
computable. 

3 Negative Results 

In coding theory, it is possible to construct codes that can tolerate up to a con- 
stant fraction of adversarially chosen errors and still guarantee unique decoding. 
Hence it is natural to wonder whether a similar possibility exists in group test- 
ing, namely, whether there is a measurement matrix that is robust against a 
constant fraction of adversarial errors and still recovers the measured vector 
exactly. Below we show that this is not possiblcQ: 

Lemma 2. Suppose that an m x n measurement matrix A is (eo, ei, c'q, e'l)- 
correcting for d-sparse vectors. Then (max{eo, ei} + 1) / {cq + e'^ -|- 1) < m/d. □ 

The above lemma (proved in in Appendix IC.1|) gives a trade-off between the 
tolerable error in the measurements versus the reconstruction error. In particular, 
for unique decoding to be possible one can only guarantee resiliency against up 
to 0{l/d) fraction of errors in the measurement. On the other hand, tolerance 
against a constant fraction of errors would make an ambiguity of order f2{d) 
in the decoding inevitable. Another trade-off is given by the following lemma 
(proved in Appendix IC.2|) : 



We remark that the negative results in this section hold for both adaptive and 
non-adaptive measurements. 
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Lemma 3. Suppose that an m x n measurement matrix A is (cq, ei, Cq, e'j^)- 
correcting for d-sparse vectors. Then for every e > 0, either ei < ^'^^^J"'*™ or 

As mentioned in the introduction, it is an important matter for applications 
to bring down the amount of false negatives in the reconstruction as much as 
possible, and ideally to zero. The lemma above shows that if one is willing to 
keep the number e'-^ of false negatives in the reconstruction at the zero level 
(or bounded by a constant), only an up to 0(l/d) fraction of false negatives in 
the measurements can be tolerated (regardless of the number of measurements) , 
unless the number eg of false positives in the reconstruction grows to an enormous 
amount (namely, Q(n) when n — d = f2(n)) which is certainly undesirable. 

As shown in [21], exact reconstruction of d-sparse vectors of length n, even 
in a noise-free setting, requires at least f2{d^ log'T-/ ^ogd) non-adaptive measure- 
ments. However, it turns out that there is no such restriction when an approxi- 
mate reconstruction is sought for, except for the following bound which can be 
shown using simple counting and holds for adaptive noiseless schemes as well 
(proof in Appendix IC.3P : 

Lemma 4. Let A be an mxn measurement matrix that is (0, 0, Cq, e'^)- correcting 
for d-sparse vectors. Then m > d\og{n/d) — d— Cq — C>{e[ log((n — d — eo)/e']^)), 
where the last term is defined to be zero for e'^ — Q. □ 

This is similar in spirit to the lower bound obtained in [26] for the size of 
selectors. According to the lemma, even in the noiseless scenario, any reconstruc- 
tion method that returns an approximation of the sparse vector up to Bq = 0{d) 
false positives and without false negatives will require fl{d\og{n/ d)) measure- 
ments. As we will show in the next section, an upper bound of 0{d\ogn) is 
in fact attainable even in a highly noisy setting using only non-adaptive mea- 
surements. This in particular implies an asymptotically optimal trivial two-stage 
group testing scheme. 



4 A Noise-Resilient Construction 

In this section we give our general construction and design measurement matrices 
for testing D-sparse vector^ in F^. The matrices can be seen as adjacency ma- 
trices of certain unbalanced bipartite graphs constructed from good randomness 
condensers or extractors. The main technique that we use to show the desired 
properties is the list-decoding view of randomness condensers, extractors, and 
expanders, developed over the recent years starting from the work of Ta-Shma 
and Zuckerman on extractor codes [30j . We start by introducing the terms that 
we will use in this construction and the analysis. 

* In this section we find it more convenient to use capital letters D, N, . . . instead of 
d,n, . . . that we have so far used and keep the small letters for their base-2 logarithms. 
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Definition 5. (mixtures, agreement, and agreement list) Let be a finite set. 
A mixture over Z"" is an n-tuple S :— {Si, . . . , Sn) suclr tliat every Si, i £ [n], 
is a nonempty subset of S. The agreement of w := {wi, . . .Wn) € Z" witli S, 
denoted by Agr(w, S), is the quantity £ [n]: Wi G Si}\. Moreover, we define 
the quantity wgt(S') := J2ie[n] ^^'^ Pi^) ^S^i^)/ {''^l^Di where the latter 
is the expected agreement of a random vector with 5. For a code C C Z'" and 
a € (0, 1], the a-agreement list of C with respect to S, denoted by LISTc(S', a), 
is the sell LISTc(5, a) {c e C : Agr(c, S) > a}. 

Definition 6. (induced code) Let f : F x f2 ^ S he a function mapping a finite 
set r X ^2 to a. finite set U. For x E F, we use the shorthand f{x) to denote 
the vector y := {yi)i^a, yt := f{x,i), whose coordinates are indexed by the 
elements of in a fixed order. The code induced by f, denoted by C{f) is the 
set {f(x): X e r}. The induced code has a natural encoding function given by 
X 1-^ f{x). 

Definition 7. (codeword graph) Let C C S", \IJ\ = g, be a g-ary code. The 
codeword graph of C is a bipartite graph with left vertex set C and right vertex 
set n X S, such that for every x = (xi, . . . , Xn) € C, there is an edge between 
X on the left and {l,xi), . . . , {n,Xn) on the right. The adjacency matrix of the 
codeword graph is an n\S\ x \C\ binary matrix whose (i, j)th entry is 1 iff there 
is an edge between the ith right vertex and the jth left vertex. 

The following is a straightforward generalization of the result in [30] that is 
also shown in 31j (we have included a proof in Appendix I C. 41) : 

Theorem 8. Let / : F2 x F| ^ F2 he a strong k — >e k' condenser, and C C Z"^* 
he its induced code, where S F2. Then for any mixture S over S"^ we have 
|LISTc(5,p(5)2^-'=' +e)| < 2^ □ 

Now using the above tools, we are ready to describe our construction of error- 
tolerant measurement matrices. We first state a general result without specifying 
the parameters of the condenser, and then instantiate the construction with 
various choices of the condenser, resulting in matrices with different properties. 

Theorem 9. Let / : F2 x F| — > F2 he a strong k -^^ k' condenser, and C be 
its induced code, and define the capital shorthands K := 2*^, K' := 2^ , L := 2^ , 
N :— 2", T :— 2*. Suppose that the parameters p,i',j > are chosen such 
that {p + ^)L/ K' + 1^/7 < 1 — e, and D := 7L. Then the adjacency matrix 
of the codeword graph of C (which has M := TL rows and N columns) is a 
{pM, {v / D)M, K — D,0)- correcting measurement matrix for D-sparse vectors. 
Moreover, it allows for a reconstruction algorithm with running time 0{MN). 

A proof of the theorem is given in Appendix IC.Si which uses Theorem|5]as an 
essential tool. Here we recall a description of the reconstruction algorithm from 
the proof: Let y € F^-^ be the observation outcome, and denote the (i,j)th 



® When a = 1, we consider codewords with full agreement with the mixture. 
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entry of the measurement matrix. Then the reconstruction algorithm outputs a 
vector X that satisfies (Vi e [N]) ^ 1 iff \{j e [TL] : yj = niji = f }| > 

T(f-i//7). 

Remark 1. Extractor codes that we use in Theorem ^ are instances of soft- 
decision decodahle code^^ that provide high hst-decodabihty in "extremely noisy' 
scenarios. In fact it is not hard to see that good extractors or condensers are re- 
quired for our construction to carry through, as Theorem[8]can be shown to hold 
in the reverse direction as well. However, for designing measurement matrices 
for the noiseless (or low-noise) case, it is possible to resort to the slightly weaker 
notion of list recoverable codes. This is discussed in more detail in Appendix lAl 

Tolerance on Incorrect Estimates. The result given by Theorem [5] requires 
at least an overestimate on the number of defectives (i.e., the sparsity level D 
that is controlled by the parameter 7) and the level of measurement noise (given 
by the parameters p and v). However, if in the actual experiment the estimates 
turn out to be incorrect but the trade-off on 7,p, required by the theorem 
remains valid (e.g., if the number of defectives turns out higher but the fraction 
of measurement errors lower than expected) we can still guarantee a reliable 
reconstruction. In case there is no reliable estimate on D available, one can use 
a number of trial and error rounds by starting from an initial guess of D = 1 
and doubling the guess in each round, until at some point the Hamming weight 
of the reconstruction does not exceed the amount K guaranteed by the theorem. 
For all our instantiations that follow, the total number of measurements required 
by this process remains in the same order as if we knew the actual value of D. 

Instantiations 

Now we instantiate the general result given by Theorem [9] with various choices 
of the underlying condenser and compare the obtained parameters. First, we 
consider two extreme cases, namely, a non-explicit optimal condenser with zero 
overhead (i.e., extractor) and then a non-explicit optimal condenser with zero 
loss (i.e., lossless condenser) and then consider how known explicit constructions 
can approach the obtained bounds. We remark that the sampling rate, as defined 
in [27], of these instantiations (i.e., the maximum number of tests any individual 
is included in) is 0{1/D) fraction of the number of tests. 

Applying Optimal Extractors. Radhakrishan and Ta-Shma showed that non- 
constructively, for every fc, n, e, there is a strong (fc, e)-extractor with seed length 
t = log(n - k) -h 21og(l/e) + 0(1) and output length £ = k - 21og(l/e) - 0(1), 
which is the best one can hope for [32]. In particular, they show that a random 
function achieves these parameters with probability 1 — o(l). Plugging this result 
in Theorem [9l we obtain a non-explicit measurement matrix from a simple, 
randomized construction that achieves the desired trade-off with high probability 
(see Appendix IC.6I for the proof details): 

To be precise, here we are dealing with a special case of soft-decision decoding with 
binary weights. 
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Corollary 10. For every choice of constants p G [0,1) and v G [Oji'o), vq := 
— Ap — l)'^/8, and positive integers D and N > D, there is an M x N 
measurement matrix, where M = 0{D\ogN), that is {pM,{i'/D)M,O{D),0)- 
correcting for D-sparse vectors of length N and allows for a reconstruction al- 
gorithm with running time 0{MN). □ 

This instantiation, in particular, reproduces a result on randomized construc- 
tion of approximate group testing schemes with optimal number of measurements 
in [5], but with stringent conditions on the noise tolerance of the scheme. 

Applying Optimal Lossless Condensers. The probabilistic construction of 
Radhakrishan and Ta-Shma can be extended to the case of lossless condensers 
and one can show that a random function is with high probability a strong 
k k condenser with seed length t — logn-|-log(l/e) + 0(1) and output length 
i ^ k + log(l/e) + 0(1) [33]. This combined with Theorem [9] gives the following 
corollary (proof in Appendix IC.7P : 

Corollary 11. For positive integers N > D and every constant 6 > there is an 
MxN measurement matrix, where M 0{D\ogN), that is (i?(M), Q{l/D)M, 
SD, 0) -correcting for D-sparse vectors of length N and allows for a reconstruction 
algorithm with running time 0{MN). □ 

Both results obtained in Corollaries [TUl and [TT] almost match the lower bound 
of Lemma |4] for the number of measurements. However, we note the following 
distinction between the two results: Instantiating the general construction of 
Theorem[S]with an extractor gives us a sharp control over the fraction of tolerable 
errors, and in particular, we can obtain a measurement matrix that is robust 
against any constant fraction (bounded from 1) of false positives. However, the 
number of false positives in the reconstruction will be bounded by some constant 
fraction of the sparsity of the vector that cannot be made arbitrarily close to zero. 
On the other hand, a lossless condenser enables us to bring down the number of 
false positives in the reconstruction to an arbitrarily small fraction of D (which 
is, in light of Lemma [U the best we can hope for), but on the other hand, does 
not give as good a control on the fraction of tolerable errors as in the extractor 
case, though we still obtain resilience against the same order of errors. 

Applying the Guruswami-Umans-Vadhan's Extractor. While Corollaries 
1101 and [11] give probabilistic constructions of noise-resilient measurement matri- 
ces, certain applications require a fully explicit matrix that is guaranteed to 
work. To that end, we need to instantiate Theorem [9] with an explicit condenser. 
First, we use a nearly-optimal explicit extractor due to Guruswami, Umans and 
Vadhan, summarized in the following theorem: 

Theorem 12. \31f For all positive integers n > k and all e > 0, there is an 
explicit strong (fc, e)-extractor Ext: x ^ F| with i = k — 21og(l/e) — 0(1) 
and t = logn + 0(logA; • log(fc/e)). □ 

Applying this result in Theorem [5] we obtain a similar trade-off as in Corol- 
lary [TO] except for a higher number of measurements which would be bounded 
by 0(20(i°g'i°gi')£)log7V) = 0{D^+°^^hogN). 
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Applying the Zig-Zag Lossless Condenser. In [33] an explicit lossless con- 
denser with optimal output length is constructed. In particular they show the 
following: 

Theorem 13. \33f For every k < n E e > there is an explicit k — >c k 
condense^^ with seed length 0{log^{n/e)) and output length fc + log(l/e) + 0(1). 

Combined with Theorem [9l we obtain a similar result as in Corollary [Til ex- 
cept that the number of measurements would be D2^°^' (i°g^) = Z)-quasipoly(log A^) 

Measurements Allowing Sublinear Time Reconstruction. The naive re- 
construction algorithm given by Theorem [5| works efficiently in linear time in the 
size of the measurement matrix. However, as mentioned in the introduction, for 
very sparse vectors (i.e., D <C N) it might be of practical importance to have 
a reconstruction algorithm that runs in sublinear time in iV, the length of the 
vector, and ideally, polynomial in the number of measurements, which is merely 
poly(logiV, D) if the number of measurements is optimal. 

As shown in [30] , if the code C in Theorem [8| is obtained from a strong 
extractor constructed from a black-box pseudorandom generator (PRG), it is 
possible to compute the agreement list (which is guaranteed by the theorem 
to be small) more efficiently than a simple exhaustive search over all possible 
codewords. In particular, in this case they show that LISTc(S', /9(5') -|- e) can be 
computed in time poly (2*, 2^, 2*^, 1/e), which can be much smaller than 2". On the 
other hand, observe that the main computational task done by the reconstruction 
algorithm in Theorem [9| is in fact computing a suitable agreement list for the 
induced code of the underlying condenser. 

Currently two constructions of extractors from black-box PRCs are known: 
Trevisan's extractor [34] (as well as its improvement in [35]) and Shaltiel-Umans' 
extractor |36j . However, the latter can only extract a sub-constant fraction of 
the min-entropy and is not suitable for our needs, albeit it requires a consid- 
erably shorter seed than Trevisan's extractor. Thus, here we only consider an 
improvement of Trevisan's extractor given by Raz et al., quoted below. 

Theorem 14. \33l For every n,k,£ G fi, {£ < k < n) and e > 0, there is 
an explicit strong (k, e)- extractor Tre: x F| F2 with t — 0(log^(n/e) • 
log(l/a)), where a := k/{£ — 1) — 1 must be less than 1/2. □ 

Using this result in Theorem [9l we obtain a measurement matrix for which 
the reconstruction is possible in polynomial time in the number of measure- 
ments; however, as the seed length required by this extractor is larger than 
Theorem I12| we will now require a higher number of measurements than be- 
fore. Specifically, we obtain the same parameters as in Corollary [TU] using Tre- 
visan's extractor except for the number of measurements, M = 0{D2^°^ log^) = 
D ■ quasipoly(log A^). 

Though not explicitly mentioned in [33], these condensers can be considered to be 
strong. 
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Furthermore, Guruswami et al. 31j construct lossless (and lossy) condensers 
that are not known to correspond to black-box PRGs but however allow efficient 
list-recovery. In particular they show the following: 

Theorem 15. fUj? For all constants a e (0, 1) and every k<n£^,e>0 there 
is an explicit strong k — k condenser with seed length t = (1 + 1/a) log(nfc/e) -|- 
0(1) and output length i = d + (1 + a)k. Moreover, the condenser has efficient 
list recovery. □ 

The code induced by the condenser given by this theorem is precisely a 
Parvaresh-Vardy code [37] and thus, the efficient list recovery is merely the list- 
decoding algorithm for this code. Combined with Theorem [9] we can show that 
codeword graphs of Parvaresh-Vardy codes correspond to good measurement 
matrices that allow sublinear time recovery, but with incomparable parameters to 
what we obtained from Trevisan's extractor (the proof is similar to CoroUarv fTTj) : 

Corollary 16. For positive integers N > D and any constants d,a > there 
is an M X N measurement matrix, where M = 0(£)-^+"+^/"(log A^)^+^/"), that 
is {fi{e), i2{el D), 5D, 0)-correcting for D-sparse vectors of length N , where e := 
(\ogNY^^^°'D^^^^°' . Moreover, the matrix allows for a reconstruction algorithm 
with running time poly(Af). □ 

We remark that we could also use a lossless condenser due to Ta-Shma et al. 
|38| which is based on Trevisan's extractor and also allows efficient list recovery, 
but it achieves inferior parameters compared to Corollary [161 

Future Work 

For the purpose of this exposition, we have focused on the asymptotic trade- 
offs and on several occasions have neglected certain details such as the hidden 
constants in the O(-) notation that become important for practical purposes. 
We defer the task of estimating and optimizing for these parameters as well as 
obtaining experimental results to the subsequent work. Moreover, an interesting 
theoretical question to ask is whether our reduction from group testing schemes 
to construction of condensers holds in the reverse direction as well; namely, 
whether one can obtain a a good condenser from any highly noise-resilient group 
testing scheme. 
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A Connection with List-Recoverability 



A pointed out in Remark [Jl measurement matrices that approximate sparse 
vectors using a small number of noiseless measurements can be constructed from 
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list recoverable codes. Formally, a code C of block length n over an alphabet E 
is called {a, D, L)-list recoverable if for every mixture S over 17" consisting of 
sets of size at most D each, we have |LISTc(S', a)! < L. A simple argument 
similar to Theorem [5] shows that the adjacency matrix of the codeword graph 
of such a code with rate R gives a (log A^)|Z'|/i? x N measurement matri^o for 
Z?-sparse vectors in the noiseless case with at most L — D false positives in the 
reconstruction. Ideally, a list-recoverable code with a — 1, alphabet size 0{D), 
positive constant rate, and list size L = 0{D) would give an 0(_D log A^) x N 
matrix for Z)-sparse vectors, which is almost optimal (furthermore, the recovery 
would be possible in sublinear time if C is equipped with efficient list recovery). 
However, no explicit construction of such a code is so far known. 

Two natural choices of list-recoverable codes are Reed-Solomon and Algebraic- 
Geometric codes, which in fact provide soft-decision decoding with short list size 
(cf. |39j). However, while the list size is polynomially bounded by n and Z?, it 
can be much larger than 0{D) that we need for our application even if the 
rate is polynomially small in D. On the other hand, it is shown in [40j that 
folded Reed-Solomon Codes are list-recoverable with constant rate, but again 
they suffer from large alphabet and list sizj^. We also point out a construction 
of (a, D, D) list-recoverable codes (allowing list recovery in time 0{nD)) in [40] 
with rate polynomially small but alphabet size exponentially large in _D, from 
which they obtain superimposed codes. 

B Connection with the Bit-Probe Model and Designs 

An important problem in data structures is the static set membership problem 
in bit-probe model, which is the following: Given a set S of at most d elements 
from a universe of size n, store the set as a string of length m such that any 
query of the type "is x in S*?" can be reliably answered by reading few bits of 
the encoding. The query algorithm might be probabilistic, and be allowed to 
err with a small one or two-sided error. Information theoretically, it is easy to 
see that m = fl{d\og{n/ d)) regardless of the bit-probe complexity and even if a 
small constant error is allowed. 

Remarkably, it was shown in |41j that the lower bound on m can be (non- 
explicitly) achieved using only one bit-probe. Moreover, a part of their work 
shows that any one-probe scheme with negative one-sided error e (where the 
scheme only errs in case x ^ S) gives a [d/ej -superimposed code (and hence, 
requires m = fi{d'^\ogn) by ;2T!). It follows that from any such scheme one 
can obtain a measurement matrix for exact reconstruction of sparse vectors, 
which, by Lemma [21 cannot provide high resiliency against noise. The converse 
direction, i.e., using superimposed codes to design bit-probe schemes does not 

For codes over large alphabets, the factor IX"! in the number of rows can be improved 
using concatenation with a suitable inner measurement matrix. 
As shown in [51], folded Reed-Solomon codes can be used to construct lossless con- 
densers, which eliminates the list size problem. However, they give inferior parame- 
ters compared to Parvaresh-Vardy codes used in Corollary 1161 
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necessarily hold unless the error is allowed to be very close to 1. However, in [41] 
combinatorial desianl^ based on low-degree polynomials are used to construct 
one bit-probe schemes with m = 0{cP log^ n) and small one-sided error. 

On the other hand, Kautz and Singleton [24 observed that the encoding of 
a combinatorial design as a binary matrix corresponds to a superimposed code 
(which is in fact slightly error- resilient) . Moreover, they used Reed-Solomon 
codes to construct a design, which in particular gives a d-superimposed code. 
This is in fact the same design that is used in [IT], and in our terminology, can 
be regarded as the adjacency matrix of the codeword graph of a Reed-Solomon 
code. It is interesting to observe the intimate similarity between our framework 
given by Theorem[9]and classical constructions of superimposed codes. However, 
some key differences are worth mentioning. Indeed, both constructions are based 
on codeword graphs of error-correcting codes. However, classical superimposed 
codes owe their properties to the large distance of the underlying code. On 
the other hand, our construction uses extractor and condenser codes and does 
not give a superimposed code simply because of the substantially low number 
of measurements. However, as shown in Theorem |9l they are good enough for 
a slight relaxation of the notion of superimposed codes because of their soft- 
decision list decodability properties, which additionally enables us to attain high 
noise resilience and a considerably smaller number of measurements. 

Interestingly, Buhrman et at 141! randomly chosen bipartite graphs to 
construct storage schemes with two-sided error requiring nearly optimal space 
0{d\ogn), and Ta-Shma [32] later shows that expander graphs from lossless 
condensers would be sufficient for this purpose. However, unlike schemes with 
negative one-sided error, these schemes use encoders that cannot be implemented 
by the OR function and thus do not translate to group-testing schemes. 



C Omitted Proofs 
C.l Proof of Lemma [2| 

We use similar arguments as those used in [43144] in the context of black-box 
hardness amplification in NP: Define a partial ordering -< between binary vectors 
using bit-wise comparisons (with < 1). Let t := d/ (eg -I- e'l + 1) be an integeiF^. 
and consider any monotonically increasing sequence of vectors xq ^ ■ • • ^ Xt in 
F2 where Xi has weight i(eQ + e'l + 1). Thus, xq and xt will have weights zero 
and d, respectively. Note that we must also have A[xo] A[xt] due to 

monotonicity of the OR function. 

A fact that is directly deduced from Definition [T] is that, for every x,x' G , 
if (>l[a;], ^[x']) are (eg, ei)-close, then x and x' must be (eg-l-e'^, eQ-|-e'2)-close. This 

A design is a collection of subsets of a universe, each of the same size, such that the 
pairwise intersection of any two subset is upper bounded by a prespecified parameter. 
For the sake of simplicity in this presentation we ignore the fact that certain fractions 
might in general give non-integer values. However, it should be clear that this will 
cause no loss of generality. 
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can be seen by setting y :~ A[x'] in the definition, for which there exists a valid 
decoding z g . As {A[x\,y) are (eo, ei)-close, the definition impHes that (x, z) 
must be (gq, ei)-close. Moreover, y) are (0, 0)-close and thus, (ep, ei)-close, 

which imphes that (z, x') must be (e'j^, eQ)-close. Thus by the triangle inequality, 
(x, x') must be (cq + e'^, Cq + ei)-close. 

Now, observe that for all «, (xi, x^+i) are (cg + e']^, eQ + e']^)-far, and hence, their 
encodings must be (eo, ei)-far, by the fact we just mentioned. In particular this 
implies that A[xt] must have weight at least i(eo + 1), which must be trivially 
upper bounded by m. Hence it follows that (eo + l)/(eQ + ej + l) < m/d. Similarly 
we can also show that (ei + l)/(eQ + e'^ + 1) < m/d. 

C.2 Proof of Lemma H 

Let X £ F2 be chosen uniformly at random among vectors of weight d. Randomly 
flip e'l + 1 of the bits on the support of x to 0, and denote the resulting vector 
by x' . Using the partial ordering -< in the proof of the last lemma, it is obvious 
that x' -< X, and hence, A[x'] -< A[x]. Let b denote any disjunction of a number 
of coordinates in x and b' the same disjunction in x' . Wc must have 

Pr[6' = 0|6=l]<^i^, 

as for & to be 1 at least one of the variables on the support of x must be present 
in the disjunction and one particular such variable must necessarily be flipped to 
bring the value of b' down to zero. Using this, the expected Hamming distance 
between A[x] and A[x'] can be bounded as follows: 

E[dist(A[x], = HA[x]^ = 1 A A[x'], = 0) < • m, 

i£[m] 

where the expectation is over the randomness of x and the bit flips. Fix a par- 
ticular choice of x' that keeps the expectation at most (e'l + l)m/d. Now the 
randomness is over the possibilities of x, that is, flipping up to e'l + 1 zero coordi- 
nates of x' randomly. Denote by X the set of possibilities of x for which A[x] and 
A[x'] are ^''^'^J''"' -close, and by S the set of all vectors that are monotonically 
larger than x' and are {e[ + l)-close to it. Obviously, X C S, and, by Markov's 
inequality, we know that \X\ > (1 — e)|iS|. 

Let z be any vahd decoding of A[x'], Thus, {x' , z) must be {e'^, e[)-c\ose. 
Now assume that ei > ^''^^^j"''™ and consider any x € X. Hence, are 
(eo, ei)-close and {x,z) must be (e^,, e']^)-close by Definition[TJ Regard as 
the characteristic vectors of sets X, X', Z C [n], respectively, where X' C X. We 
know that \X\Z\< e[ and \X \ X'\ = e[ + 1. Therefore, 

\{x \ X') nz\ = \x\ x'\ -\x\z\ + \x' \z\>o, (i) 

and z must take at least one nonzero coordinate from supp(a;) \ supp(x'). 
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Now we construct an {e'l + l)-hypergraph H as follows: The vertex set is [n] \ 
supp(x'), and for every x e A", we put a hyperedge containing supp(a;) \ supp(a;'). 
The density of this hypergraph is at least 1 — e, by the fact that \X\ > (1 — e)S. 
Now Lemma HH] implies that H has a matching of size at least 

_ (l-6)(n-d+l) 

K + 1)^ ■ 

As by ([1]), supp(z) must contain at least one element from the vertices in each 
hyperedge of this matching, we conclude that |supp(z) \ supp(x')| > t, and that 
e'o>t. 

C.3 Proof of Lemma [4] 

For integers a > 5 > 0, we use the notation V{a, b) for the volume of a Hamming 
ball of radius 6 in F2 . It is given by 

where h{-) is the binary entropy function, and thus 

logF(a,6) <61og^ + (a-6)log^ =e(61og(a/6)). 
a — 

Also, denote by V'{a, b, cq, ei) the number of vectors in that are (eg, ei)-close 
to a fixed 6-sparse vector. Obviously, V'{a, b, cq, ei) < V{b, eo)V{a — 6, ei). Now 
consider any (wlog, deterministic) reconstruction algorithm D and let X denote 
the set of all vectors in F2 that it returns for some noiseless encoding; that is, 

X :^{xeF'^\3yeB,x^D{A[y])}, 

where B is the set of c?-sparse vectors in F2 . Notice that all vectors in X must be 
(d+eo)-sparse, as they have to be close to the corresponding "correct" decoding. 
For each vector x £ X and y £ B, we say that x is matching to y if (?/, x) are 
(cq, ei)-close. A vector x £ X can be matching to at most v :— V'{n, d+ep, Cq, e'^) 
vectors in B, and we upper bound log v as follows: 

\ogv < logVin-d-e'o,e[) + \ogVid+e'o,e'o) = 0{e[\og{{n-d-e'o)/e[)) + d+e'o, 

where the term inside O(-) is interpreted as zero when e'l — 0. Moreover, every 
y (z B must have at least one matching vector in X, namely, This means 

that |X| > \B\/v, and that 

log \X\ > log \B\ -logv> dlogin/d) -d-e'o- 0{e[ log((n - d - ej,)/e'i)). 

Finally, we observe that the number of measurements has to be at least log \X\ 
to enable D to output all the vectors in X. 
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C.4 Proof of Theorem M 

Index the coordinates of S by the elements of F| and denote the zth coordinate by 
Si . Let Y be any random variable with min-entropy at least t + k' distributed on 
F2"'"''' . Define an information theoretic test T : x F| ^ F2 as follows: T{x,i) = 
1 if and only if X € S^. Observe that Pr[T{Y) = 1] < \Ngt{S)2- = p(S')2^-'=', 
and that for every vector w G (Fa)^ , Pr^^j^^ [T(ii;i, i) — 1] — Agr(w, S). Now let 
the random variable X — {Xi , . . . , ) be uniformly distributed on the code- 
words in LISTc(S', p{S)2^^^ + e) and Z ^ Ut- Thus, from Definition [5] we know 
that Prx,z[T(Xz,Z) = 1] > p(S')2^"'=' + e. As the choice of Y was arbitrary, 
this implies that T is able to distinguish between the distribution of [Z, X) and 
any distribution on W*"^^ with min-entropy at least t + k', with bias greater than 
e, which by the definition of condensers implies that the min-entropy of X must 
be less than fc, or |LISTc(5, /9(S')2^-'=' + e)| < 

C.5 Proof of Theorem [9] 

Denote by A4 the adjacency matrix of the codeword graph of C and by M the 
number of its rows. It immediately follows from the construction that M = TL. 
Moreover, notice that the Hamming weight of each column of M is exactly T. 
Let X € F^ and denote by y G Fj^ its encoding, i.e., y :— M[x\, and by y G F^ 
a received word, or a noisy version of y. The encoding of x can be schematically 
viewed as follows: The coefficients of x are assigned to the left vertices of the 
codeword graph and the encoded bit on each right vertex is the bitwise OR 
of the values of its neighbors. The coordinates of x can be seen in one-to-one 
correspondence with the codewords of C. Let X Q C he the set of codewords 
corresponding to the support of x. The coordinates of the noisy encoding y are 
indexed by the elements of [T] x [L] and thus, y naturally defines a mixture 
S = {Si, . . . , St) over [L]'^ , where Si contains j iff y at position [i,]) is 1. 
Observe that p{S) is the relative Hamming weight (denoted below by 5{-)) of y; 
thus, 

p{S) = 5{y) < S{y) +p<^+p = j + p, 

where the last inequality comes from the fact that the relative weight of each 
column of A4 is exactly 1/L and that x is i?-sparse. Furthermore, from the 
assumption we know that the number of false negatives in the measurement is 
at most vTL/ D = vT j^. Therefore, any codeword in X must have agreement at 
least 1 — V 1^ with S . This is because S is indeed constructed from a mixture of 
the elements in X, modulo false positives (that do not decrease the agreement) 
and at most vT false negatives each of which can reduce the agreement by at 
most 1/T. 

Accordingly, we consider a decoder which simply outputs a binary vector x 
supported on the coordinates corresponding to those codewords of C that have 
agreement larger than 1 — 1^/7 with 5*. Clearly, the running time of the decoder 
is linear in the size of the measurement matrix. By the discussion above, x 



20 



must include the support of x. Moreover, Theorem [8] apphes for our choice of 
parameters, implying that the Hamming weight of x must be less than K . 

C.6 Proof of Corollary [H 

For simplicity we assume that N — 2" and = 2"^ for positive integers n 
and d. However, it should be clear that this restriction will cause no loss of 
generality and can be eliminated with a slight change in the constants behind 
the asymptotic notations. 

We instantiate the parameters of Theorem [9] using an optimal strong extrac- 
tor. If J/ = 0, we choose 7, e small constants such that 7+e < 1 —p. Otherwise, we 
choose 7 := which makes u/j = ^ and e < 1 — p — \fv — \fi^ . (One can 
easily see that the right hand side of the latter inequality is positive for 1/ < t/g). 
Hence, the condition p-\- u < 1 — e — 7 required by Theorem [9] is satisfied. Let 
r = 21og(l/e) + 0(1) = 0(1) be the entropy loss of the extractor for error e, 
and set up the extractor for min-entropy k — log D + log(l/7) + r, which means 
that K = 2^ = 0[D) and L = 2*^ = D/7 = 0{D). Now we can apply The- 
orem [3] and conclude that the measurement matrix is {pM,{i//D)M,O{D),0)- 
correcting. The seed length required by Ext is < < logri + 21og(l/e) -f- 0(1), 
which gives T = 2* = O(logiV). Therefore, the number of measurements will be 
M^TL = 0{D log N). 

C . 7 Proof of Corollary [H] 

We will use the notation of Theorem [9] and apply it using an optimal strong 
lossless condenser. Set up the condenser with error e := ^S/{1 + 6) and min- 
entropy k such that K — 2'^ — D / (1— 2e). As the error is a constant, the overhead 
and hence L/K will also be a constant. The seed length is i = log(n/e) -I- 0(1), 
which makes T = 0(log7V). As L = 0{D), the number of measurements will 
be M = TL = 0{D log A^), as desired. Moreover, note that our choice of K will 
imply that K — D = SD. Thus we only need to choose p and v appropriately to 
satisfy the condition {p + j)L/ K + v/j < 1 — e, where j = D/L = K/{L{1 -\- 5)) 
is a constant, as required by the lemma. Substituting for 7, we will get the 
condition pL / K + vL/ {K{1 + 5)) < S/{l + 6), which can be satisfied by choosing 
p and v to be appropriate positive constants. 

D A Combinatorial Lemma 

For a positive integer c > 1, define a c-hypergraph as a tuple (V, E), where V is 
the set of vertices and E is the set of hyperedges and every e € E is a subset of 
V of size c. The degree of a vertex v G V, denoted by deg(w), is the size of the set 
{e G E: V e E}. Note that \E\ < (1^1) and deg(w) < The density of the 

hypergraph is given by |£'|/('^'). A vertex cover on the hypergraph is a subset 
of vertices that contains at least one vertex from every hyperedge. A matching is 
a set of pairwise disjoint hyperedges. It is well known that any dense hypergraph 
must have a large matching. Below we reconstruct a proof of this claim. 
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Proposition 17. Let H be a c-hypergraph such that every vertex cover of H 
has size at least k. Then H has a matching of size at least k/c. 

Proof. Let M be a maximal matching of H, i.e., a matching that cannot be 
extended by adding further hyperedges. Let C be the set of all vertices that 
participate in hyperedges of M. Then C has to be a vertex cover, as otherwise 
one could add an uncovered hyperedge to M and violate maximality of M. 
Hence, c\M\ — \C\ > k, and the claim follows. □ 

Lemma 18. Let H — {V, E) he a c-hypergraph with density at least e > 0. Then 
H has a matching of size at least ^(|V^| — c+1). 

Proof. For every subset S CV of size c, denote by 1(5') the indicator value of S 
being in E. Let C be any vertex cover of H. Denote by S the set of all subsets 
of V of size c. Then we have 

^ ^ ses vec ^ ^ 

Hence, |C| > e(n — c + l)/c, and the claim follows using Proposition [TT] □ 
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Det/ 


Roc. 


m 


eo 






Rnd 


Time 


0(d\ogn) 


am 


Q(7n/d) 


0(d) 


Rnd 


0(mn) 


O(dlogn) 


n(m) 


n\m/d) 


5d 


Rnd 


0(mn) 


0(di+°''^ logn) 


am 


a{m/d) 


0{d) 


Dct 


0(mn) 


d ■ quasipoly(logn) 


n{m) 


n{m/d) 


Sd 


Dot 


0(mn) 


d ■ quasipoly(log n) 


am 


Q{m/d) 


o(d) 


Dot 


poly(rn,) 


poly(d)poly(log n) 


poly (d) poly (log n) 


n{eo/d) 


Sd 


Det 


poly(m) 



Table 1. A summary of constructions in this paper. The parameters a G [0, 1) 
and S e (0, 1] are arbitrary constants, m is the number of measurements, eo 
(resp., ei) the number of tolerable false positives (rcsp., negatives) in the mea- 
surements, and c'q is the number of false positives in the reconstruction. The 
fifth column shows whether the construction is deterministic (Det) or random- 
ized (Rnd), and the last column shows the running time of the reconstruction 
algorithm. 



